This is my attempt to create a document out of the inclass coding scripts.

Step 1 : Load tidyverse library and read the data

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
indata <- read.csv("~/Desktop/FertilityRates.csv")

Step 2: Need to check if the data loaded correctly or not.

dim(indata)
## [1] 219  56

From the above output we can see there are 219 rows and 56 columns

Step 3 : Check the first 6th rows of the data by using a function head() and for the last 6 columns we need to use a function tail()

head(indata)
##           Country.Name Country.Code                           Indicator.Name
## 1                Aruba          ABW Fertility rate, total (births per woman)
## 2              Andorra          AND Fertility rate, total (births per woman)
## 3          Afghanistan          AFG Fertility rate, total (births per woman)
## 4               Angola          AGO Fertility rate, total (births per woman)
## 5              Albania          ALB Fertility rate, total (births per woman)
## 6 United Arab Emirates          ARE Fertility rate, total (births per woman)
##   Indicator.Code X1960 X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969
## 1 SP.DYN.TFRT.IN 4.820 4.655 4.471 4.271 4.059 3.842 3.625 3.417 3.226 3.054
## 2 SP.DYN.TFRT.IN    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3 SP.DYN.TFRT.IN 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671
## 4 SP.DYN.TFRT.IN 7.316 7.354 7.385 7.410 7.425 7.430 7.422 7.403 7.375 7.339
## 5 SP.DYN.TFRT.IN 6.186 6.076 5.956 5.833 5.711 5.594 5.483 5.376 5.268 5.160
## 6 SP.DYN.TFRT.IN 6.928 6.910 6.893 6.877 6.861 6.841 6.816 6.783 6.738 6.679
##   X1970 X1971 X1972 X1973 X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981 X1982
## 1 2.908 2.788 2.691 2.613 2.552 2.506 2.472 2.446 2.425 2.408 2.392 2.377 2.364
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3 7.671 7.671 7.671 7.671 7.671 7.671 7.670 7.670 7.670 7.669 7.669 7.670 7.671
## 4 7.301 7.264 7.232 7.208 7.192 7.185 7.186 7.189 7.194 7.197 7.200 7.201 7.203
## 5 5.050 4.933 4.809 4.677 4.538 4.393 4.244 4.094 3.947 3.807 3.678 3.562 3.460
## 6 6.605 6.512 6.402 6.279 6.146 6.009 5.873 5.744 5.624 5.517 5.423 5.344 5.274
##   X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995
## 1 2.353 2.342 2.332 2.320 2.307 2.291 2.272 2.249 2.221 2.187 2.149 2.108 2.064
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3 7.673 7.676 7.679 7.681 7.682 7.682 7.682 7.687 7.700 7.725 7.758 7.796 7.832
## 4 7.205 7.207 7.208 7.206 7.202 7.194 7.182 7.165 7.143 7.116 7.087 7.054 7.019
## 5 3.372 3.297 3.233 3.177 3.126 3.075 3.023 2.970 2.917 2.867 2.819 2.772 2.723
## 6 5.209 5.141 5.065 4.973 4.860 4.724 4.566 4.388 4.193 3.989 3.784 3.583 3.393
##   X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008
## 1 2.021 1.979 1.940 1.905 1.874 1.848 1.825 1.805 1.786 1.769 1.754 1.739 1.726
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 1.240 1.180 1.250
## 3 7.859 7.869 7.854 7.809 7.733 7.623 7.484 7.321 7.136 6.930 6.702 6.456 6.196
## 4 6.984 6.949 6.913 6.878 6.844 6.811 6.778 6.743 6.704 6.657 6.598 6.523 6.434
## 5 2.670 2.611 2.543 2.467 2.383 2.291 2.195 2.097 2.004 1.919 1.849 1.796 1.761
## 6 3.215 3.052 2.902 2.766 2.644 2.532 2.428 2.329 2.236 2.149 2.071 2.004 1.948
##   X2009 X2010 X2011
## 1 1.713 1.701 1.690
## 2 1.190 1.220    NA
## 3 5.928 5.659 5.395
## 4 6.331 6.218 6.099
## 5 1.744 1.741 1.748
## 6 1.903 1.868 1.841
tail(indata)
##         Country.Name Country.Code                           Indicator.Name
## 214            Samoa          WSM Fertility rate, total (births per woman)
## 215      Yemen, Rep.          YEM Fertility rate, total (births per woman)
## 216     South Africa          ZAF Fertility rate, total (births per woman)
## 217 Congo, Dem. Rep.          COD Fertility rate, total (births per woman)
## 218           Zambia          ZMB Fertility rate, total (births per woman)
## 219         Zimbabwe          ZWE Fertility rate, total (births per woman)
##     Indicator.Code X1960 X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969
## 214 SP.DYN.TFRT.IN 7.651 7.645 7.630 7.604 7.567 7.519 7.462 7.400 7.334 7.265
## 215 SP.DYN.TFRT.IN 7.292 7.308 7.328 7.354 7.385 7.418 7.447 7.471 7.491 7.512
## 216 SP.DYN.TFRT.IN 6.173 6.144 6.103 6.049 5.984 5.911 5.836 5.765 5.700 5.643
## 217 SP.DYN.TFRT.IN 6.001 6.015 6.030 6.048 6.067 6.089 6.111 6.135 6.161 6.187
## 218 SP.DYN.TFRT.IN 7.018 7.071 7.127 7.184 7.240 7.292 7.338 7.376 7.405 7.425
## 219 SP.DYN.TFRT.IN 7.158 7.215 7.267 7.311 7.347 7.373 7.391 7.403 7.411 7.415
##     X1970 X1971 X1972 X1973 X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981
## 214 7.194 7.119 7.039 6.952 6.859 6.761 6.656 6.547 6.434 6.320 6.203 6.086
## 215 7.542 7.593 7.672 7.782 7.922 8.089 8.277 8.474 8.667 8.843 8.993 9.108
## 216 5.591 5.539 5.482 5.415 5.338 5.251 5.159 5.064 4.969 4.877 4.786 4.695
## 217 6.214 6.242 6.271 6.301 6.333 6.366 6.403 6.443 6.487 6.535 6.585 6.636
## 218 7.437 7.443 7.446 7.447 7.444 7.435 7.414 7.378 7.326 7.259 7.182 7.098
## 219 7.417 7.419 7.420 7.417 7.410 7.395 7.369 7.330 7.273 7.196 7.095 6.967
##     X1982 X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992 X1993
## 214 5.968 5.850 5.734 5.620 5.510 5.404 5.303 5.208 5.118 5.034 4.956 4.882
## 215 9.185 9.223 9.223 9.186 9.119 9.030 8.925 8.805 8.667 8.504 8.311 8.088
## 216 4.602 4.504 4.402 4.293 4.177 4.052 3.923 3.791 3.658 3.530 3.408 3.296
## 217 6.689 6.742 6.794 6.847 6.902 6.960 7.019 7.077 7.133 7.183 7.223 7.251
## 218 7.015 6.937 6.865 6.799 6.737 6.673 6.606 6.537 6.468 6.401 6.340 6.288
## 219 6.811 6.633 6.435 6.223 6.004 5.784 5.569 5.365 5.176 5.001 4.840 4.690
##     X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005
## 214 4.815 4.751 4.692 4.637 4.587 4.541 4.503 4.476 4.460 4.454 4.456 4.460
## 215 7.841 7.578 7.310 7.049 6.802 6.574 6.363 6.166 5.975 5.782 5.588 5.393
## 216 3.196 3.110 3.040 2.983 2.937 2.899 2.866 2.834 2.801 2.763 2.721 2.675
## 217 7.267 7.267 7.253 7.227 7.190 7.143 7.089 7.027 6.960 6.887 6.809 6.728
## 218 6.245 6.209 6.179 6.152 6.126 6.098 6.071 6.044 6.018 5.995 5.974 5.954
## 219 4.554 4.432 4.328 4.240 4.169 4.112 4.069 4.039 4.018 4.002 3.987 3.969
##     X2006 X2007 X2008 X2009 X2010 X2011
## 214 4.460 4.450 4.426 4.389 4.338 4.277
## 215 5.199 5.010 4.829 4.658 4.498 4.348
## 216 2.627 2.580 2.538 2.500 2.467 2.438
## 217 6.642 6.550 6.454 6.354 6.251 6.146
## 218 5.932 5.908 5.881 5.849 5.813 5.773
## 219 3.941 3.903 3.853 3.792 3.721 3.643

Data looks as expected

Step 4 : Check whether it is a data frame or a tibble?

If it is a data frame we will get an output as True. In case it is a tibble then the output will be False, so we need to convert it into tibble by using a function called as_tibble

is.data.frame(indata)
## [1] TRUE

We received an output as True it means it is a data frame.

Step 5 : Check the data is tibble or not?

is_tibble(indata)
## [1] FALSE

We can see that we got an output as False so it is not a data frame.

Step 6 : Convert it into a tibble by using a function called as_tibble

indata <- as_tibble(indata)
is_tibble(indata)
## [1] TRUE

After converting, we received an output as True so it is a tibble.

Step 7 : Need to check the summary of data

summary(indata)
##  Country.Name       Country.Code       Indicator.Name     Indicator.Code    
##  Length:219         Length:219         Length:219         Length:219        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      X1960           X1961           X1962           X1963      
##  Min.   :1.940   Min.   :1.940   Min.   :1.790   Min.   :1.810  
##  1st Qu.:4.210   1st Qu.:4.027   1st Qu.:4.123   1st Qu.:4.057  
##  Median :6.179   Median :6.144   Median :6.122   Median :6.104  
##  Mean   :5.512   Mean   :5.492   Mean   :5.492   Mean   :5.488  
##  3rd Qu.:6.803   3rd Qu.:6.803   3rd Qu.:6.821   3rd Qu.:6.821  
##  Max.   :8.187   Max.   :8.194   Max.   :8.197   Max.   :8.198  
##  NA's   :25      NA's   :24      NA's   :25      NA's   :26     
##      X1964           X1965           X1966           X1967      
##  Min.   :1.790   Min.   :1.740   Min.   :1.580   Min.   :1.800  
##  1st Qu.:3.950   1st Qu.:3.821   1st Qu.:3.644   1st Qu.:3.549  
##  Median :6.061   Median :6.079   Median :6.045   Median :5.995  
##  Mean   :5.442   Mean   :5.392   Mean   :5.338   Mean   :5.294  
##  3rd Qu.:6.801   3rd Qu.:6.799   3rd Qu.:6.795   3rd Qu.:6.747  
##  Max.   :8.198   Max.   :8.198   Max.   :8.198   Max.   :8.201  
##  NA's   :25      NA's   :25      NA's   :25      NA's   :25     
##      X1968           X1969           X1970           X1971      
##  Min.   :1.830   Min.   :1.851   Min.   :1.828   Min.   :1.703  
##  1st Qu.:3.394   1st Qu.:3.247   1st Qu.:3.093   1st Qu.:2.999  
##  Median :5.912   Median :5.798   Median :5.745   Median :5.681  
##  Mean   :5.240   Mean   :5.187   Mean   :5.134   Mean   :5.074  
##  3rd Qu.:6.738   3rd Qu.:6.687   3rd Qu.:6.693   3rd Qu.:6.665  
##  Max.   :8.207   Max.   :8.217   Max.   :8.231   Max.   :8.252  
##  NA's   :25      NA's   :25      NA's   :25      NA's   :24     
##      X1972           X1973           X1974           X1975      
##  Min.   :1.593   Min.   :1.504   Min.   :1.510   Min.   :1.450  
##  1st Qu.:3.026   1st Qu.:2.943   1st Qu.:2.851   1st Qu.:2.712  
##  Median :5.521   Median :5.416   Median :5.340   Median :5.234  
##  Mean   :5.022   Mean   :4.962   Mean   :4.904   Mean   :4.838  
##  3rd Qu.:6.713   3rd Qu.:6.710   3rd Qu.:6.676   3rd Qu.:6.674  
##  Max.   :8.278   Max.   :8.307   Max.   :8.339   Max.   :8.370  
##  NA's   :23      NA's   :25      NA's   :25      NA's   :25     
##      X1976           X1977           X1978           X1979      
##  Min.   :1.440   Min.   :1.400   Min.   :1.380   Min.   :1.380  
##  1st Qu.:2.591   1st Qu.:2.506   1st Qu.:2.472   1st Qu.:2.453  
##  Median :5.159   Median :5.093   Median :5.022   Median :4.884  
##  Mean   :4.791   Mean   :4.713   Mean   :4.657   Mean   :4.610  
##  3rd Qu.:6.635   3rd Qu.:6.582   3rd Qu.:6.535   3rd Qu.:6.483  
##  Max.   :8.399   Max.   :8.474   Max.   :8.667   Max.   :8.843  
##  NA's   :25      NA's   :25      NA's   :25      NA's   :25     
##      X1980           X1981           X1982           X1983      
##  Min.   :1.440   Min.   :1.430   Min.   :1.410   Min.   :1.330  
##  1st Qu.:2.401   1st Qu.:2.370   1st Qu.:2.390   1st Qu.:2.346  
##  Median :4.755   Median :4.607   Median :4.505   Median :4.456  
##  Mean   :4.559   Mean   :4.495   Mean   :4.436   Mean   :4.393  
##  3rd Qu.:6.439   3rd Qu.:6.364   3rd Qu.:6.309   3rd Qu.:6.272  
##  Max.   :8.993   Max.   :9.108   Max.   :9.185   Max.   :9.223  
##  NA's   :25      NA's   :23      NA's   :20      NA's   :23     
##      X1984           X1985           X1986           X1987      
##  Min.   :1.290   Min.   :1.370   Min.   :1.340   Min.   :1.280  
##  1st Qu.:2.294   1st Qu.:2.301   1st Qu.:2.257   1st Qu.:2.274  
##  Median :4.359   Median :4.224   Median :4.112   Median :3.985  
##  Mean   :4.340   Mean   :4.279   Mean   :4.219   Mean   :4.149  
##  3rd Qu.:6.220   3rd Qu.:6.186   3rd Qu.:6.061   3rd Qu.:5.905  
##  Max.   :9.223   Max.   :9.186   Max.   :9.119   Max.   :9.030  
##  NA's   :23      NA's   :23      NA's   :23      NA's   :19     
##      X1988           X1989           X1990           X1991      
##  Min.   :1.320   Min.   :1.280   Min.   :1.260   Min.   :1.270  
##  1st Qu.:2.232   1st Qu.:2.200   1st Qu.:2.180   1st Qu.:2.121  
##  Median :3.908   Median :3.752   Median :3.558   Median :3.485  
##  Mean   :4.099   Mean   :4.024   Mean   :3.956   Mean   :3.875  
##  3rd Qu.:5.840   3rd Qu.:5.727   3rd Qu.:5.577   3rd Qu.:5.442  
##  Max.   :8.925   Max.   :8.805   Max.   :8.667   Max.   :8.504  
##  NA's   :23      NA's   :23      NA's   :20      NA's   :20     
##      X1992           X1993           X1994           X1995      
##  Min.   :1.290   Min.   :1.250   Min.   :1.200   Min.   :1.180  
##  1st Qu.:2.120   1st Qu.:2.027   1st Qu.:1.960   1st Qu.:1.879  
##  Median :3.330   Median :3.279   Median :3.147   Median :3.082  
##  Mean   :3.791   Mean   :3.729   Mean   :3.645   Mean   :3.561  
##  3rd Qu.:5.259   3rd Qu.:5.191   3rd Qu.:5.056   3rd Qu.:4.956  
##  Max.   :8.311   Max.   :8.088   Max.   :7.841   Max.   :7.832  
##  NA's   :18      NA's   :21      NA's   :20      NA's   :18     
##      X1996           X1997           X1998           X1999      
##  Min.   :1.150   Min.   :1.090   Min.   :1.017   Min.   :0.982  
##  1st Qu.:1.913   1st Qu.:1.854   1st Qu.:1.806   1st Qu.:1.796  
##  Median :2.989   Median :2.869   Median :2.855   Median :2.804  
##  Mean   :3.517   Mean   :3.422   Mean   :3.379   Mean   :3.337  
##  3rd Qu.:4.864   3rd Qu.:4.636   3rd Qu.:4.593   3rd Qu.:4.545  
##  Max.   :7.859   Max.   :7.869   Max.   :7.854   Max.   :7.809  
##  NA's   :21      NA's   :17      NA's   :20      NA's   :19     
##      X2000           X2001           X2002           X2003      
##  Min.   :0.939   Min.   :0.891   Min.   :0.856   Min.   :0.838  
##  1st Qu.:1.778   1st Qu.:1.780   1st Qu.:1.759   1st Qu.:1.770  
##  Median :2.679   Median :2.614   Median :2.558   Median :2.524  
##  Mean   :3.252   Mean   :3.199   Mean   :3.134   Mean   :3.103  
##  3rd Qu.:4.352   3rd Qu.:4.268   3rd Qu.:4.134   3rd Qu.:4.050  
##  Max.   :7.733   Max.   :7.704   Max.   :7.681   Max.   :7.658  
##  NA's   :17      NA's   :18      NA's   :15      NA's   :17     
##      X2004           X2005           X2006           X2007      
##  Min.   :0.836   Min.   :0.849   Min.   :0.874   Min.   :0.906  
##  1st Qu.:1.782   1st Qu.:1.775   1st Qu.:1.800   1st Qu.:1.797  
##  Median :2.518   Median :2.496   Median :2.425   Median :2.422  
##  Mean   :3.075   Mean   :3.038   Mean   :2.995   Mean   :2.961  
##  3rd Qu.:3.987   3rd Qu.:3.980   3rd Qu.:3.911   3rd Qu.:3.818  
##  Max.   :7.636   Max.   :7.617   Max.   :7.602   Max.   :7.593  
##  NA's   :18      NA's   :16      NA's   :14      NA's   :13     
##      X2008           X2009           X2010           X2011      
##  Min.   :0.939   Min.   :0.973   Min.   :1.003   Min.   :1.031  
##  1st Qu.:1.796   1st Qu.:1.800   1st Qu.:1.800   1st Qu.:1.796  
##  Median :2.384   Median :2.374   Median :2.344   Median :2.334  
##  Mean   :2.935   Mean   :2.904   Mean   :2.876   Mean   :2.854  
##  3rd Qu.:3.733   3rd Qu.:3.691   3rd Qu.:3.665   3rd Qu.:3.633  
##  Max.   :7.588   Max.   :7.585   Max.   :7.584   Max.   :7.581  
##  NA's   :14      NA's   :14      NA's   :15      NA's   :17

From the above output we can see that summary of the data such as mean, median, 1st quartile value, 3rd quartile value.

Step 8 : Need to convert Country Name into a factor by using as.factor()

indata$Country.Name <- as.factor(indata$Country.Name)
summary(indata$Country.Name)
##              Afghanistan                  Albania                  Algeria 
##                        1                        1                        1 
##           American Samoa                  Andorra                   Angola 
##                        1                        1                        1 
##      Antigua and Barbuda                Argentina                  Armenia 
##                        1                        1                        1 
##                    Aruba                Australia                  Austria 
##                        1                        1                        1 
##               Azerbaijan             Bahamas, The                  Bahrain 
##                        1                        1                        1 
##               Bangladesh                 Barbados                  Belarus 
##                        1                        1                        1 
##                  Belgium                   Belize                    Benin 
##                        1                        1                        1 
##                  Bermuda                   Bhutan                  Bolivia 
##                        1                        1                        1 
##   Bosnia and Herzegovina                 Botswana                   Brazil 
##                        1                        1                        1 
##        Brunei Darussalam                 Bulgaria             Burkina Faso 
##                        1                        1                        1 
##                  Burundi               Cabo Verde                 Cambodia 
##                        1                        1                        1 
##                 Cameroon                   Canada           Cayman Islands 
##                        1                        1                        1 
## Central African Republic                     Chad          Channel Islands 
##                        1                        1                        1 
##                    Chile                    China                 Colombia 
##                        1                        1                        1 
##                  Comoros         Congo, Dem. Rep.              Congo, Rep. 
##                        1                        1                        1 
##               Costa Rica            Cote d'Ivoire                  Croatia 
##                        1                        1                        1 
##                     Cuba                  Curacao                   Cyprus 
##                        1                        1                        1 
##           Czech Republic                  Denmark                 Djibouti 
##                        1                        1                        1 
##                 Dominica       Dominican Republic                  Ecuador 
##                        1                        1                        1 
##         Egypt, Arab Rep.              El Salvador        Equatorial Guinea 
##                        1                        1                        1 
##                  Eritrea                  Estonia                 Ethiopia 
##                        1                        1                        1 
##           Faeroe Islands                     Fiji                  Finland 
##                        1                        1                        1 
##                   France         French Polynesia                    Gabon 
##                        1                        1                        1 
##              Gambia, The                  Georgia                  Germany 
##                        1                        1                        1 
##                    Ghana                   Greece                Greenland 
##                        1                        1                        1 
##                  Grenada                     Guam                Guatemala 
##                        1                        1                        1 
##                   Guinea            Guinea-Bissau                   Guyana 
##                        1                        1                        1 
##                    Haiti                 Honduras     Hong Kong SAR, China 
##                        1                        1                        1 
##                  Hungary                  Iceland                    India 
##                        1                        1                        1 
##                Indonesia       Iran, Islamic Rep.                     Iraq 
##                        1                        1                        1 
##                  Ireland              Isle of Man                   Israel 
##                        1                        1                        1 
##                    Italy                  Jamaica                    Japan 
##                        1                        1                        1 
##                   Jordan               Kazakhstan                    Kenya 
##                        1                        1                        1 
##                  (Other) 
##                      120

From the above output we can see that summary of country name and converted country name into a factor

Step 9 :Need to convert Indicator Name into a factor by using the as.factor()

indata$Indicator.Name <- as.factor(indata$Indicator.Name)
summary(indata$Indicator.Name)
## Fertility rate, total (births per woman) 
##                                      219

From the above output we can see that summary of Indicator name and converted country name into a factor

Step 10 : Need to remove extra columns

indata_cleaned <- select(indata, -(Country.Code:Indicator.Code))
head(indata_cleaned)
## # A tibble: 6 × 53
##   Country.Name X1960 X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970
##   <fct>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Aruba         4.82  4.66  4.47  4.27  4.06  3.84  3.62  3.42  3.23  3.05  2.91
## 2 Andorra      NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
## 3 Afghanistan   7.67  7.67  7.67  7.67  7.67  7.67  7.67  7.67  7.67  7.67  7.67
## 4 Angola        7.32  7.35  7.38  7.41  7.42  7.43  7.42  7.40  7.38  7.34  7.30
## 5 Albania       6.19  6.08  5.96  5.83  5.71  5.59  5.48  5.38  5.27  5.16  5.05
## 6 United Arab…  6.93  6.91  6.89  6.88  6.86  6.84  6.82  6.78  6.74  6.68  6.60
## # … with 41 more variables: X1971 <dbl>, X1972 <dbl>, X1973 <dbl>, X1974 <dbl>,
## #   X1975 <dbl>, X1976 <dbl>, X1977 <dbl>, X1978 <dbl>, X1979 <dbl>,
## #   X1980 <dbl>, X1981 <dbl>, X1982 <dbl>, X1983 <dbl>, X1984 <dbl>,
## #   X1985 <dbl>, X1986 <dbl>, X1987 <dbl>, X1988 <dbl>, X1989 <dbl>,
## #   X1990 <dbl>, X1991 <dbl>, X1992 <dbl>, X1993 <dbl>, X1994 <dbl>,
## #   X1995 <dbl>, X1996 <dbl>, X1997 <dbl>, X1998 <dbl>, X1999 <dbl>,
## #   X2000 <dbl>, X2001 <dbl>, X2002 <dbl>, X2003 <dbl>, X2004 <dbl>, …

We can observe that we have removed the columns.

Step 11 : Convert wider dataset into a longer dataset by using pivot_longer

indata_pivoted <- pivot_longer(indata_cleaned, c(str_c("X", 1960:2011)), names_to = "Year", values_to="Fertility.Rates")

head(indata_pivoted)
## # A tibble: 6 × 3
##   Country.Name Year  Fertility.Rates
##   <fct>        <chr>           <dbl>
## 1 Aruba        X1960            4.82
## 2 Aruba        X1961            4.66
## 3 Aruba        X1962            4.47
## 4 Aruba        X1963            4.27
## 5 Aruba        X1964            4.06
## 6 Aruba        X1965            3.84

From the result we can observe each year Fertility rates

Step 12 : While formating the year values we have to remove extra characters from it after that we need to convert into an integer by using as.integer

indata_pivoted$Year <- as.integer(str_sub(indata_pivoted$Year, 2,5))

The output we observe that the characters has been removed successfully.

Step 13 : Need to check how many missing values are there in Fertility.Rates column by using as.na()

sum(is.na(indata_pivoted$Fertility.Rates))
## [1] 1104

Step 14 : Group missing values by country and filter missing values after that group by country and finally count the number of values.

indata_pivoted %>%  filter(is.na(Fertility.Rates)) %>% group_by(Country.Name) %>% summarise(count = n())
## # A tibble: 27 × 2
##    Country.Name   count
##    <fct>          <int>
##  1 American Samoa    52
##  2 Andorra           47
##  3 Bermuda           43
##  4 Cayman Islands    52
##  5 Curacao           47
##  6 Dominica          45
##  7 Faeroe Islands    52
##  8 Greenland         30
##  9 Isle of Man       49
## 10 Kosovo            21
## # … with 17 more rows

Filtering fertility rates and grouping by country name.

Step 15 : Within the countries, we need to fill missing values by using downup()

indata_filled <- indata_pivoted %>% group_by(Country.Name) %>% fill(Fertility.Rates, .direction = "downup") %>% ungroup()

Step 16 : We have to check whether the missing values are completely removed or not.

indata_filled %>%  filter(is.na(Fertility.Rates)) %>% group_by(Country.Name) %>% summarise(count = n())
## # A tibble: 9 × 2
##   Country.Name                            count
##   <fct>                                   <int>
## 1 American Samoa                             52
## 2 Cayman Islands                             52
## 3 Faeroe Islands                             52
## 4 Monaco                                     52
## 5 Northern Mariana Islands                   52
## 6 San Marino                                 52
## 7 Sub-Saharan Africa (IFC classification)    52
## 8 Turks and Caicos Islands                   52
## 9 Tuvalu                                     52

Step 17 : Filter all countries with missing values and removing them

indata_filled <- indata_filled %>% filter(!is.na(Fertility.Rates))

Step 18 : Plot a scatter plot, where x-axis = Year and Y axis = Fertility.Rates

ggplot(data = indata_filled) + geom_point(mapping = aes(x=Year, y = Fertility.Rates))

From the above chart is not visually clear need to improve it’s effectiveness

Step 19 : Plot a scatter plot with a smoothing line

ggplot(data = indata_filled) + geom_point(mapping = aes(x=Year, y = Fertility.Rates), position = "jitter", alpha = 0.1) + geom_smooth(mapping = aes(x=Year, y = Fertility.Rates)) + labs(x = "Year", y = "Fertility Rates", title = "Fertility Rates over the years", subtitle = "Global fertility rates have decreased since 1960")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Declining Fertility rates are observe with the help of a regression line

Step 20 : Need to categorize 210 countries by two ways option1 : So many data points are there, we need to reduce some of them so we are subsetting the data.

indata_subset <- filter(indata_filled, Country.Name %in% c("United States", "Mexico", "Canada"))

ggplot(data = indata_subset) + geom_line(mapping = aes(x=Year, y = Fertility.Rates, color = Country.Name), alpha = 0.5) + labs(x = "Year", y = "Fertility Rates", title = "Fertility Rates over the years", subtitle = "Global fertility rates have decreased since 1960")

Global fertility rates of these three countries are decreasing year by year from 1960.

Option 2 : 210 countries categorize by selecting based on statistics and printing 10 countries by using top_n(10)

indata_filled %>%  group_by(Country.Name) %>% summarise(avg = mean(Fertility.Rates)) %>% arrange(desc(avg)) %>% print(n = 10)
## # A tibble: 210 × 2
##    Country.Name   avg
##    <fct>        <dbl>
##  1 Niger         7.59
##  2 Afghanistan   7.47
##  3 Yemen, Rep.   7.43
##  4 Somalia       7.26
##  5 Rwanda        7.23
##  6 Burundi       7.19
##  7 Angola        7.06
##  8 Uganda        6.94
##  9 Mali          6.92
## 10 Chad          6.92
## # … with 200 more rows

Step 21 : Print bottom 10 countries by using top_n(-10)

indata_filled %>%  group_by(Country.Name) %>% summarise(avg = mean(Fertility.Rates)) %>% arrange(desc(avg)) %>% top_n(-10) %>% ggplot() + geom_bar(mapping = aes(x = Country.Name, y = avg), stat = "identity") +coord_flip()
## Selecting by avg

From the above graph, we can observe the difference in average of bottom different countries, switzerland has the highest average while andorra has the least average

Step 22 : If there is no countrycode library then need to install it.

library(countrycode)

Step 23 : Need to do optioning Adding a new variable called continent and grouping them by country name

indata_df <- as.data.frame(indata_filled)
indata_df$Continent.Name <- factor(countrycode(sourcevar = indata_df[,"Country.Name"], origin = "country.name", destination = "continent"))
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Channel Islands, Kosovo, Latin America & Caribbean (all income levels), OECD members, Other small states, Pacific island small states

Step 24 : Adding a new variable called region and grouping them by country

indata_df$Region.Name <- factor(countrycode(sourcevar = indata_df[,"Country.Name"], origin = "country.name", destination = "region"))
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Latin America & Caribbean (all income levels), OECD members, Other small states, Pacific island small states

Step 25 : Here, in the result we can observe countries divided by continent name and region name

head(indata_df)
##   Country.Name Year Fertility.Rates Continent.Name               Region.Name
## 1        Aruba 1960           4.820       Americas Latin America & Caribbean
## 2        Aruba 1961           4.655       Americas Latin America & Caribbean
## 3        Aruba 1962           4.471       Americas Latin America & Caribbean
## 4        Aruba 1963           4.271       Americas Latin America & Caribbean
## 5        Aruba 1964           4.059       Americas Latin America & Caribbean
## 6        Aruba 1965           3.842       Americas Latin America & Caribbean

Step 26 : Convert data into tibble by using as_tibble

indata <- as_tibble(indata_df)
head(indata)
## # A tibble: 6 × 5
##   Country.Name  Year Fertility.Rates Continent.Name Region.Name              
##   <fct>        <int>           <dbl> <fct>          <fct>                    
## 1 Aruba         1960            4.82 Americas       Latin America & Caribbean
## 2 Aruba         1961            4.66 Americas       Latin America & Caribbean
## 3 Aruba         1962            4.47 Americas       Latin America & Caribbean
## 4 Aruba         1963            4.27 Americas       Latin America & Caribbean
## 5 Aruba         1964            4.06 Americas       Latin America & Caribbean
## 6 Aruba         1965            3.84 Americas       Latin America & Caribbean

We can see it is converted into a tibble

Step 27 : Plotting Fertility rates by differentiating by continent name

ggplot(data = indata) + geom_point(mapping = aes(x=Year, y = Fertility.Rates, color = Continent.Name), position = "jitter", alpha = 0.6) + labs(x = "Year", y = "Fertility Rates", title = "Fertility Rates over the years", subtitle = "Global fertility rates have decreased since 1960")

The fertility rates of different continents can be seen on yearly basis can be observed with the color difference.

Step 28 : Plotting the counties in Middle East & North Africa region and Asian continent

indata %>% filter(Region.Name == "Middle East & North Africa" & Continent.Name == "Asia") %>% ggplot() + geom_line(mapping = aes(x =Year, y = Fertility.Rates, color =  Country.Name), size = 1, linetype = 2)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

The fertility rates of different Countries can be seen year wise

Step 29 : Plot a Boxplot of different continent Fertility rates

ggplot(data =  indata, mapping = aes(x = Continent.Name, y = Fertility.Rates)) + geom_boxplot() + coord_flip()

We can see the fertility rates of different continents

Step 30 : Plot a Histogram where x axis is Fertility Rates and y-axis is Count

ggplot(data = indata) + geom_histogram(mapping = aes(x = Fertility.Rates), binwidth = 0.5)

From the above output we can see that fertility rates with the count was plotted

Step 31 : calculating average and plotting the average of different regions by using Bar graph

indata %>% group_by(Region.Name) %>% summarise(avg = mean(Fertility.Rates)) %>% ggplot(mapping = aes(x = reorder(Region.Name, -avg), y = avg, fill=Region.Name)) + geom_bar(stat = "identity") + coord_flip()

We can observe the difference between average of different regions in ascending order